home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Personal Computer World 2009 February
/
PCWFEB09.iso
/
Software
/
Linux
/
Kubuntu 8.10
/
kubuntu-8.10-desktop-i386.iso
/
casper
/
filesystem.squashfs
/
usr
/
share
/
doc
/
grep
/
TODO
< prev
Wrap
Text File
|
2005-11-08
|
3KB
|
99 lines
Get sane performance with UTF-8 locales.
1) rewrite the configure.in script, perhaps also Makefile.am
2) set up for gnulib-tool --import
3) improve the test infrastructure
4) check in the patches for the sync of dfa.c with GNU awk
5) other small patches which wait for a test case
6) process the Fedora/Red Hat patches
7) some _minimal_ cleanup of the grep(), grepdir(), recursion
(the "main loop") and fix --directories=read
##
Write Texinfo documentation for grep. The manual page would be a good
place to start, but Info documents are also supposed to contain a
tutorial and examples.
Fix the DFA matcher to never use exponential space. (Fortunately, these
cases are rare.)
Improve the performance of the regex backtracking matcher. This matcher
is agonizingly slow, and is responsible for grep sometimes being slower
than Unix grep when backreferences are used.
Provide support for the Posix [= =] and [. .] constructs. This is
difficult because it requires locale-dependent details of the character
set and collating sequence, but Posix does not standardize any method
for accessing this information!
##
Some test in tests/spencer2.tests should have failed !!!
Need to filter out some bugs in dfa.[ch]/regex.[ch].
Threads for grep ?
GNU grep does 32 bits arithmetic, it needs to move to 64.
Clean up, to many #ifdef's !!
Check some new Algorithms for matching, talk to Karl Berry and Nelson.
Sunday's "Quick Search" Algorithm (CACM 33, 8 August 1990 pp. 132-142)
claim that his algo. is faster then Boyer-More ????
Worth Checking.
Check <http://flame.cs.dal.ca/~taa/greps.html>.
Take a look at:
-- cgrep (Context grep) seems like nice work;
-- sgrep (Struct grep);
-- agrep (Approximate grep), from glimpse;
-- nr-grep (Nondeterministic reverse grep);
-- ggrep (Grouse grep);
-- grep.py (Python grep);
-- lgrep (from lv, a Powerful Multilingual File Viewer / Grep);
-- freegrep <http://www.vocito.com/downloads/software/grep/>;
-- ja-grep's mlb2 patch (Japanese grep);
-- pcregrep (from Perl-Compatible Regular Expressions library).
Can we merge ?
Check FreeBSD's integration of zgrep (-Z) and bzgrep (-J) in one binary.
POSIX Compliance see p10003.x
Moving away from GNU regex API for POSIX regex API.
Better and faster !!
##
Check POSIX:
-- Volume "Base Definitions (XBD)",
Chapter "Regular Expressions"
and in particular
Section "Regular Expression General Requirements"
and its paragraph about caseless matching.
Check the Unicode Standard:
-- Chapter 3 ("Conformance"),
Section 3.13 ("Default Case Operations")
and the toCasefold() case conversion operation;
-- Chapter 4 ("Character Properties"),
Section 4.2 ("Case -- Normative")
and the SpecialCasing.txt and CaseFolding.txt
files from the Unicode Database;
-- Chapter 5 ("Implementation Guidelines"),
Section 5.18 ("Case Mappings"),
Subsection "Caseless Matching".
Check Unicode Technical Standard #18 ("Unicode Regular Expressions").
Check Unicode Standard Annex #15 ("Unicode Normalization Forms").
##
Before every release:
-- drop dfa.[ch] into a copy of gawk and run "make check";
-- send pot file to the Translation Project to get fresh po files;
-- get up-to-date version of ABOUT-NLS;
-- update NEWS.